SIMULATE PROGRAM EXECUTION AND GENERATE TRACE FILE BLOCKS 



GENERATE 



SOURCE CODE EXPRESSION BLOCKS FROM TRACE FILE 

BLOCKS 



GENERATE 



MINIMAL TIMING, COMPILED EXPRESSION BLOCKS 
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LINK MINIMAL TIMING, COMPILED EXPRESSION BLOCKS TO USER 

PROGRAM 



i — / 



3* 



V*- 



Self_Tuning_Array A(64,64) ,B(64,64) ,C(64,64) ; 
Index I (64) , J(64) ; 

W -*a[i] [j] = b[i+u [j] 



U B[I-1] [J]; 
>C[I][J] = A[I] [J+l] A[I] [J-l] ; 
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1<1>(0,0) = 2<1>(1,0) + 2<1>(-1,0) 
7« 3<1>(0,0) = 1<1>(0,1) - 1<1>(0,-1) 
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void e!22 (STArray* al, STArray* a2, STArray* a3 , int B, int U) 



// initialize tl -> t6 with Array data & offsets 
for( int ii = 0; ii < B; ii++ ) { // block with B 
for( int jj = 0; jj < B; jj++ ) { 

for( int i = ii*B; i < (ii+1) *B; i++ ) { 
for( int k = 0; k < U; k++ ) { 

for( int j = jj*B; j < (jj+l)*B; j += U ) 
*tl++ = *t2++ + *t3++; 

*t4++ = *t5++ + *t6++; // unroll U times 
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el22(al, a2 , a3 , optimalB, optimalU) ; 
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